Leveraging three-tier deep learning model for environmental cleaner plants production

The world's population is expected to exceed 9 billion people by 2050, necessitating a 70% increase in agricultural output and food production to meet the demand. Due to resource shortages, climate change, the COVID-19 pandemic, and highly harsh socioeconomic predictions, such a demand is challenging to complete without using computation and forecasting methods. Machine learning has grown with big data and high-performance computers technologies to open up new data-intensive scientific opportunities in the multidisciplinary agri-technology area. Throughout the plant's developmental period, diseases and pests are natural disasters, from seed production to seedling growth. This paper introduces an early diagnosis framework for plant diseases based on fog computing and edge environment by IoT sensors measurements and communication technologies. The effectiveness of employing pre-trained CNN architectures as feature extractors in identifying plant illnesses has been studied. As feature extractors, standard pre-trained CNN models, AlexNet are employed. The obtained in-depth features are eliminated by proposing a revised version of the grey wolf optimization (GWO) algorithm that approved its efficiency through experiments. The features subset selected were used to train the SVM classifier. Ten datasets for different plants are utilized to assess the proposed model. According to the findings, the proposed model achieved better outcomes for all used datasets. As an average for all datasets, the accuracy of the proposed model is 93.84 compared to 85.49, 87.89, 87.04 for AlexNet, GoogleNet, and the SVM, respectively.

The third fold is using a fog environment for computing all necessary tasks of image preprocessing, visualization, monitoring, and local decision support systems for detection and prediction tasks.As a new way of extending and assisting cloud computing, Fog Computing is a rapidly evolving technology.Its proximity to edge users, openness, and mobility, make fog computing platforms ideal for providing services to users quickly and improving the QoS (Quality of Service) of Internet of Things devices.A customer application based on IoT involving real-time activities in agriculture is increasingly reliant on this method 10 .
Lastly, developing a novel version of the grey wolf optimization algorithm (GWO) for selecting the important features to feed to the classifiers.This process is very important to select the relevant features to accelerate the prediction models with fair accuracy.The selected features are fed to the SVM and compared to the standard model, which used all the features from AlexNet.
The remainder of the paper is structured as follows: "Related work" provides some studies about the recent work.An overview of the basic concepts and methods utilized in this paper is presented in "Methods and overviews"."Proposed methodology" provides the suggested methodology in detail.The experiment setting and results are shown in "Experimental results and discussion"."Conclusion and future work" concludes with a look at what's next.

Related work
Abbas et al. 11 , presented a technique based on deep learning for tomato disease diagnosis.To categorize tomato leaf pictures into ten disease categories, the DenseNet121 approach was trained on real and synthetic images using transfer learning.The suggested approach attained an accuracy of 97.11%, 98.65%, and 99.51% for the classification of leaf images into 10 classes, 7 classes, and 5 classes, respectively.
Thenmozhi and Reddy 12 , proposed a powerful CNN approach, and transfer learning is being applied to achieve the best or a desired performance of the pre-training model.Three public insect datasets were used to classify insect species, with accuracy rates of 96.75 percent, 97.47 percent, and 95.97 percent, respectively.Wiesner-Hanks et al. 13 , utilized community data for training a CNN, and nutrition the output into a conditional random field (CRF) to divide pictures into non-lesion and lesion areas with an accuracy of 0.9979 and F1 score of 0.7153.
Too et al. 14 , utilized DenseNets, which have a propensity to always progress in accuracy as the number of iterations increases, with no evidence of performance decay or overfitting.For the classification of plant disease, an accuracy score of 99.75% was achieved.Chen et al. 15 , presented CNN architecture depended on a gliding window to construct a structure for location regression calculation and recognition of pests' species and plant diseases, feature fusion, characteristics automatic learning, and the identification rate of 38 frequent symptoms was 50-90%.Zhou et al. 16 , demonstrated a rapid approach for the detection of rice diseases founded on the combination of Faster R-CNN and FCM-KM.The sheath blight, bacterial blight, and detection accuracy, and rice blast time were 98.26 percent/0.53s, 97.53 percent/0.82s, and 96.71 percent/0.65s respectively, based on the application results of 3010 images.
Sethy et al. 17 presented 5932 on-field pictures of 4 different kinds of rice leaf illnesses: brown spot, bacterial blight, tungro, and blast.Furthermore, the effectiveness of eleven CNN architectures in the deep feature with SVM and the transfer learning approach was assessed.According to the experimental findings, the deep feature of ResNet50 with SVM outperforms transfer learning equivalent in classification.Deep learning-based methods for identifying illnesses and pests in rice plant pictures have been developed by Rahman et al. 18 .A two-stage tiny CNN design was developed, and it was compared to SqueezeNet, NasNet Mobile, and MobileNet.The simulation findings demonstrated that the suggested framework could attain the necessary accuracy of 93.3%.
Guo et al. 19 presented a mathematical model based on deep learning for the recognition of plant disease and detection.The model was tested for illnesses such as rust diseases, black rot, and bacterial plaque.The results indicated that the accuracy of the model is 83.57%, which is greater than the previous technique, decreasing the impact of illness on agricultural productivity and being beneficial to agriculture's long-term improvement.Atila et al. 20 , presented the EfficientNet model for the plant leaf disease classification, and the performance of the model was compared to existing previous deep learning techniques.The experimental findings revealed that the B4 and B5 approaches of the EfficientNet attained the greatest rates in the original and enhanced datasets, with the accuracy of 99.91% and 99.97%, and precision of 98.42% and 99.39% respectively.9]21 .
Table 1 summarizes the role of ML/DL in agriculture for plant diseases classification using accuracy measurement as mentioned by many authors.It is observable that most of the recent works use the PlantVillage dataset and deploying a set of pre-trained CNN models.In this paper, new datasets have been used for testing our proposed architecture for plant disease classification.
The next section handles an overview of the problem statement and the used methods in this paper.

Overvies
Climate change, population expansion, and food security concerns have pushed the sector to explore more creative ways to agricultural yield protection and improvement.As a result, artificial intelligence (AI) is progressively developing as a component of the industry's technical growth 31 .Popular applications of traditional machine learning algorithms in agriculture are: • Recognition/harvesting of vegetables and fruits.• Land cover classification.
In comparison to the defined segmentation, detection, and classification tasks in computer vision, the criteria for detecting pests and plant diseases are quite broad.Its needs may be classified into three categories: what, where, and how.Even though, the fact that the function needs and aims of the 3 phases of plant disease and pest detection are distinct, the three stages are mutually inclusive 32 .The classification job in computer vision is represented by "what" in the first step.Classification defines the image globally using feature expression and then decides if the image contains a certain type of object using the classification process.While structure learning is the primary research path in object detection, feature expression is the primary research path in classification tasks.
Machine learning (ML) has developed alongside high-performance computation and big data technologies to open up new avenues for unraveling, quantifying, and comprehending data-intensive processes in agricultural operational contexts.ML offers machines the capacity to learn without being precisely programmed 33 .Convolutional neural networks (CNNs) are more complex to construct than traditional neural networks, but they are simpler to utilize.It is not required to extract picture characteristics independently in the case of this sort of neural network.In image classification problems, complex and pre-trained CNNs with millions of parameters are frequently utilized.Their complete training is difficult since it is a time-consuming and labor-intensive procedure 34 .With developments in machine learning (ML) principles, significant gains in agricultural activities have been noticed.The capacity to extract features automatically generates an adaptable nature in deep learning (DL), especially CNNs, which achieves human-level accuracy in a variety of agricultural applications, prominent among which are crop/plant recognition, fruit counting, land cover classification, weed/crop discrimination, and plant disease detection and classification 35 .

Methods
Transfer learning.Data Transfer Learning (DTL) is a strategy in which knowledge derived from the data is transferred to solve various but associated assignments to train the CNN, including new data that often comes from a lower population 36 .To initialize the models and pre-train two profound convolutionary neuro-network models, transfer learning was used: AlexNet and GoogleNet.
AlexNet is expected to be the first recommended deep CNN technology due to its remarkable outcomes for the identification and classification functions on image data 37 .In an attempt to improve hardware constraints and obtain the total functionality of deep CNN, AlexNet was trained on two parallel GPUs.In AlexNet, the CNN depth was widened from only five layers in the LetNet CNN to eight layers in the way to produce CNN appropriate to different data sets of images.Dropout, ReLU, and pre-processing are major attributes to attain significant improvement in computer vision applications.The common 8 layers are five convolutional layers, two fully connected hidden layers, and one fully connected output layer, as shown in Fig. 1 38 .
In this study, we replaced the 1000 classes that the original AlexNet had, with only 2 classes which we evaluated in this paper, healthy images and diseased images of 10 different plants as illustrated in the dataset description.
GoogleNet consists of 22 layers deep CNN that is a version of the inception network established by Google researchers.The design of the GoogleNet structure resolved many constraints that appeared for large networks, primarily out of the use of the Inception module.The structure diagram of the GoogleNet network is shown in Fig. 2 39 .
GoogleNet consists of inception modules, so its architecture is complex.GoogleNet is looked like one of the initial CNN architectures to resist successively accumulating convolutions and pooling layers.In addition, GoogleNet plays a vital role in consideration of storage and power, since accumulating all tiers and combining different restrictions would take time for computation and will result in higher costs of memory 40 .www.nature.com/scientificreports/

Support vector machine
The deep feature extraction technique necessitates the training of a classifier method with the extracted features.Vapnik's SVM was utilized as a classifier in this study 41 .It has been found that the SVM classifier outperforms others in several agricultural image categorization tasks.
The support vector machine is a classifier with a linear or non-linear relationships that is capable of distinguishing between two different types of objects.SVMs are machine learning approaches focused on cambered improvement that operate as stated by the concept of structural risk reduction.These approaches are separate of distribution, as it does not need any details on the common distribution functions 42 .SVM training can be illustrated with algorithm 1 43 .
Algorithm 1 SVM Training Require: y and X loaded with labeled training data, α ⇐ not or partially trained SVM 1: C ⇐ some value (for example 10) 2: repeat 3: for all {xi, yi}, {xj, yj} do 4: Optimize αi and αj 5: end for 6: until no other restriction criteria are presented or changes in α Guarantee: Keep only the support vectors (αi > 0) While a hyperplane classifier can distinguish between 2 classes, certain categories surpass the highest distance set as the most effective separation hyperplane.The objective of SVM is to construct an ideal hyperplane space by utilizing training sets 40 .
The main idea behind using SVM to solve a classification issue is to find a hyperplane that best separates data from two groups.The formula for a linear SVM's output is presented in Eq. (1), where − → w is the hyperplane's normal vector and − → x is the input vector.Margin maximization may be thought of as an optimization issue: reduce Eq. ( 2) subject to Eq. ( 3) where yi and − → x are the SVM's correct output and input vector for the ith train- ing sample, respectively 44 .SVM is a binary classifier that can only distinguish between two classes and does not handle multi-class classification issues.One approach to classification of multi-class using SVMs is to build a one-to-one group of classifiers and forecast the class picked by the majority of classifiers 45 .While this allows for the creation of K (K + 1)/2 classifier for the classification issue with K classes, the classifiers training time may be decreased because the training data set for every classifier is lower.In this article, SVM is used to analyze data in addition to CNN techniques such as AlexNet and GoogleNet.

Grey wolf optimization (GWO) algorithm
Mirjalili et al. proposed the grey wolf optimizer (GWO) as a novel swarm intelligence method 46 .The GWO method has been successfully utilized and applied in a variety of research.The primary inspiration for the GWO algorithm came from the social pursuit of grey wolves in nature.Figure 3 depicts the social hierarchy as well as an instance of the position update process 47 .
In the GWO algorithm, the first, second, and third best-recommended solutions are alpha (α), beta (β), and delta (δ).Omega is projected to be the outstanding solution.The wolves can be presented in a form that is representable mathematically in Eqs.(4-8) during the hunting process: where x is the grey wolf 's vector position, xp is the prey's vector position, D is the distance between x and xp, t is the current iteration number, and A and C correspond to component-wise multiplication.
The "A' parameter is given in a [− a, a] random value according to the "a" value.Whether a gray wolf attacks or not, the value of A can be determined.As a result of the calculation, the gray wolf is exceptionally close to the hunt and can attack at any time if |A < 1| status is available.The gray wolf leaves a beast in the case of |A > 1|, hoping for a new beast.Another critical parameter of control, C, is recognized as the exploration component of the algorithm and may include random values within the range [0, 2].This variable leads to a random behavior of the algorithm that prevents an optimization at optimum local values.This condition happens if the random conduct is minimized by |C < 1| or else |C > 1| 47 .
To mimic grey wolf hunting behavior, Eqs.(9-14) show how grey wolves are positions updating of α, β, and δ wolves.It is accepted that the wolves of α, β, and δ are closest to the prey and attract the rest of the wolves to the prey area.The grey wolf population can use the following formulae to determine prey position: www.nature.com/scientificreports/ The locations are determined from Eqs. (12-14) is utilized to modify the next location of the wolves by Eq. ( 15): where xt + 1 is the location of the next iteration.Using Eq. ( 15) to find a new location for the leading wolves drives the Omega wolves to change their locations to converge with prey.
The GWO algorithm sequence consists of three steps: initialization, fitness calculation, swarm individual position updates, and the best result generation.The optimization process starts with the starting value for all control parameters, and all gray wolves are altered in regular intervals.The fitness function is then calculated based on the initial data, and the best solutions are identified as wolves of alpha, beta, and delta.The next step is to upgrade all gray wolves other than delta, beta, and alpha wolves.The next step is to renew all gray wolves' positions and controller parameter values, followed by Alpha, Beta, and delta wolves.Finally, the alpha wolf returns its optimal position value.

Fog computing and IOT
The providers of cloud computing frequently utilize data centers that consider a variety of factors, including energy consumption and user proximity.Thus, the cloud layer, the top layer, often comprises a cloud infrastructure made up of data centers that provide resources and amenities that are dynamically assigned according to the demands of the users.These services could include networking, storage, and server (rendering tools, computational power, and so on) capabilities 48 .Fog Computing attempts to bring processing capabilities closer to end-users, preventing overuse of Cloud resources, further lowering computational burdens, enhancing load balancing, and shortening wait times 49,50 .
The Internet of Things (IoT), which represents the future of communications and computers, is a breakthrough technology.IoT is now used in almost every sector, including intelligent cities, intelligent traffic control, and intelligent homes.The deployment of IoT is wide and may be applied in any field.IoT aids in better resource and crop management, crop monitoring, cost-effective agriculture, and increased quantity and quality.Air temperature sensors, soil moisture, soil pH, water volume, humidity, and other IoT sensors are employed 47 .Figure 4 shows IoT in agriculture using edge computing, fog computing, and cloud computing.
The key benefits of IoT in agriculture are discovered in these points 51 : • Community agriculture in rural and urban regions, utilizing software and hardware resources as well as vast amounts of data.• Quality and logistical traceability of food security that allows reduced costs via real-time decision-making data.
• Business strategies established in the agricultural setting that enable direct consumer contact.
• Crop surveillance allows cost savings and machine robbery avoidance.
• Systems of automatic irrigation that operate based on soil moisture levels, and temperature measured by sensors.• Environmental characteristics are automatically collected via sensor networks for subsequent analysis and processing.• Large quantities of data are analyzed by decision support systems to increase production and operational efficiency.
At the end of this section, we can summarize this paper in 3 folds; the first is applying DL models (AlexNet, GoogleNet) to extract features from plants.Secondly is using an optimization algorithm called the modified grey wolf optimization algorithm for eliminating the redundant features.The third is the classification of the output images using the support vector machine.The above techniques are divided to be used some processes in Fog and some processes in cloud computing.The next section introduces the architecture of the proposed solution using the deep learning techniques referred to above.

Proposed methodology
As technology advances, smart agricultural solutions are becoming more prevalent.Since then, technology has returned to agriculture with the latest trends and techniques it has produced.A significant advantage of smart agriculture is connecting to existing 3G and 4G networks using existing hardware and software.For smart agriculture solutions, it speeds up setting up hardware, resulting in the various successful implementation of IoT in agriculture that can run in a fog or cloud environment.There will be an evolution from the existing standard mobile computing scenario of smartphones and their apps to the connection of gadgets around us to help solve a real-world problem 52 .We'll discuss in this section the proposed methodology based on the mentioned transfer learning, pretraining methods, and the optimization algorithm on fog and cloud computing using IoT sensors common in the problem statement of this paper.
Figure 5 shows the block diagram of the proposed IoT smart agriculture network architecture which consists of three layers.The first layer contains the IOT devices that are used for different purposes in agriculture.Many technologies are being used in IoT agricultural solutions which have an important role in modernizing the services of IoT agriculture.Examples of these technologies are cloud and edge computing, machine learning and big data analytics, communication networks and protocols, and robotics.The second layer presents the sequence of work in this paper from collecting the images from IoT sensors then preprocessing these images if they need to resize, or normalize, or removing noise according to the recommended DL algorithms in this paper (CNN, SVM).All processes happened on the images from collecting it till detection of plant diseases are applied on fog environment to facilitate the function of scalability and stability that are advantages of fog computing.The third layer is connecting with cloud computing for henting resources for further and large processing.The other proposed models don't suitable for cloud or fog computing, so we proposed a new model for plant disease detection using machine learning techniques by the Internet of Things (IoT) sensors that can run on fog or cloud environments.
The proposed model depends on deep learning, transfer learning, and shallow machine learning.In deep learning, multi-hidden layers are stacked for learning objects significantly.These layers require a training process including "fine-tuning" for adjusting the weights slightly of DNN discovered during the procedure of backpropagation.In turn, following an efficient training procedure, DL nets can categorize, extract characteristics, and give a decision effectively and accurately.In the proposed model, we use transfer learning to optimize different pre-innovated CNNs architectures to the datasets.
As seen in Fig. 6, the proposed model starts by a data acquisition layer in which images are collected for different plants.This acquisition procedure was entirely wi-fi enabled, which means that the camera and the computer were linked with each other via the internet.In the preprocessing phase, the images are reconstructed and resized since the images are taken from various sources and their dimensions vary.In addition, the model layer of each of these products needs separate image dimensions to be managed.Therefore, the input image size is adjusted to fit the templates that are used in this analysis.
The feature extraction layer comes after image enhancements that represent the layer in which most of the calculations are carried out.The calculations include extracting image data set features and preserve the spatial relationship between image pixels.A pre-trained CNN, AlexNet, was used as feature extraction and we extracted   www.nature.com/scientificreports/

Modified grey wolf optimization algorithm (MGWO)
Mirjalili showed that The GWO algorithm tends to become stuck at optimal local values because of the small number of control parameters utilized in its simplest form.Because of this, researchers modified GWO by adding additional controls and changing control parameter values.According to their findings, the alpha-wolf was more powerful than the delta-and beta-wolf when searching for food.So, it's possible to acquire better outcomes in tests in this manner.For this reason, there is much research in the literature that has adapted and developed the grey wolf algorithm in various sectors.As a result of this, it produces superior outcomes in tests 47 .
The parameter adjusted equation for the "a" parameter was used in this study to improve the method 50 significantly.However, instead of using the usual GWO Eq. ( 7), this study uses Eq. ( 16) to derive the parameter "a" instead."s" is only added in Eq. ( 16) to "a", and it reflects the total number of individuals in the swarm, as seen in Eq. (16).Standard GWO has a linearly decreasing "a" parameter, which prevents the algorithm from settling on local minimum values.Researchers found that as the "a" attribute approaches 0, it not only keeps the algorithm from reaching a locally minimal value but also considerably enhances its strength.Therefore, the method converges on the optimal values faster when this parameter is reduced from 2 to 0. So, the program has sped up and parabolically slowed down from 2 to 0.
Moreover, it can be seen from the governing Eq. ( 15) that the dominants perform a similar function in the searching procedure; each of the grey wolves converges or flees away from the surroundings with an average weight of the beta, delta, and alpha.Even if the alpha is closest to the victim at first, it may be distant from the eventual result.Only the alpha position should be considered in Eq. ( 15) at the beginning of the search operation, or its weight should be substantially more significant than that of other dominants.Equation ( 15)'s average weight, on the other hand, contradicts the grey wolf social hierarchy idea.If the pack's social hierarchy is strictly observed, the alpha is in charge, which could mean that he/she will always be the closest to the prey.This indicates that the alpha wolf 's weight in Eq. ( 15) should never be smaller than that of the delta and beta wolves.As a result, the beta's weight should always be more than the delta's.In light of these concerns, the authors 53 further hypothesize the following: (1) The dominants surround a supposed prey when it is being searched for, but they do not surround an actual prey when being hunted.As their social hierarchy dictates, the dominant grey wolves encircle the prey in order of their dominance.The alpha is the closest wolf in the pack, followed by the beta and the delta.Omega wolves play a role in this process, passing on their superior positions to the dominants.(2) Alpha controls the search and hunting process, while beta has a minor role, and delta has an even smaller one.A wolf 's position changes if an alpha wins out over his/her peers.
Equation (15) should not use the same updating procedure for the positions as the previous hypotheses.Thus, the alpha weight should be near 1.0 at the beginning, whereas delta and beta weights could be close to 0. According to Eq. ( 15), the delta, beta, and alpha wolves should surround the victim at the final stage.During the entire process of searching, the alpha is always found by the beta, and the beta always finds the delta because he/she is always ranked third.As a result, the beta and delta weights are determined by the total number of iterations.Alpha's weight should be reduced, and beta and delta's weights should rise.
In mathematics, the above ideas could be stated.When adding up the weights, ensure that they're all varying and that the aggregate is capped at 1.0.As a result, Eq. ( 15) is altered to the following: As a second rule, when calculating the alpha w1, beta w2, and delta w3 weights, they should always satisfy w1 > > w2 > w3.Along with the search technique, the weight of the alpha would be adjusted from 1.0 to 1/3.While doing so, we'll boost beta and delta's weights, increasing them from 0.0 to 1/3 in the process.W1 can be described using a cosine function if we limit the angle range to be between [0, arccos (1/3)].The weights should be adjusted based on the total iterations or "it" as a third point.And we recognize that w2⋯w3 ⟶ 0 if it = 0 and w1, w2, w3 ⟶ 1/3 when it ⟶ ∞.As a result, we present an arc-tangent function that changes from 0.0 to π/2.And then, somehow, cos (π/4) = sin (π/4) = − 2 √ /2, so different angular parameter φ was organized as seen below 53 : Given that w2 would be maximized from 0.0 to 1/3 alongside it, we assume that it includes cos φ and sin θ and θ ⟶ arccos (1/3) when it ⟶ ∞; hence, when it ⟶ ∞, θ ⟶ arccos (1/3), w2 = 1/3, we can then formulate w2 in detail.The following is a new updating method for positions with variable weights that are based on these principles:

Output
Return the α (The optimal search space solution) 1. Begin 2.
End for 11.
Generate the new value for the (a) according to Eq (16) 13.
End for 16.End

Performance measures
True positives, true negatives, false negatives, and false positives, and are displayed separately in the table, in two rows and two columns, accordingly (sometimes also referred to as a confusion matrix).In this way, the classification proportion can be studied in greater detail (accuracy).An unbalanced data collection (i.e., when the number of observations in different classes changes dramatically) will lead to inaccurate conclusions.Sensitivity and specificity are also valuable traits to have.As shown in Table 2 54 , the most widely used measures are used to create the confusion matrix (Data science, 2019).Five measurements are utilized in this article to gauge our work performance, these measurements are shown in Table 3.We provided the results as two experiments.For the first experiment, a modified grey wolf optimization method (MGWO) for feature selection is being evaluated.When developing a machine-learning model, feature selection is becoming increasingly important.The feature selection process involves deleting irrelevant or redundant characteristics and picking the ideal subset of features that better categorize patterns that belong to different plants.The evaluation is made by using fifteen standard feature selection datasets.The overall of these datasets are given in Table 4 55 .Using a random seeding strategy, a random population of n wolves or search agents is formed in the first part of the procedure.An ideal solution is found when the number of features "d" equals that of the original dataset features set.When selecting features for purity classification, make sure they enhance accuracy.Identify the appropriate characteristics (one value) and discard the rest (zero).Initially, the binary values (0 and 1) were set in each solution.
A large part of GWO's success depends on the development of initial populations.We use chaotic initialization of maps to increase the global convergence speed of the MGWO optimization process.Instead of a standard map, a chaotic map serves to improve the balance of search-and-exploitation skills.The logistics map is one of the most effective chaos-based approaches.It is represented as follows in Eq. ( 28) 56 .

Precision
The ratio for all positive examples adequately classified, and the total number of positive models forecast.In positive prediction, it shows the correctness achieved Precision = TP TP+FP

Accuracy
The ratio of the entire number of correct forecasts.Precision is the proximity between the measures to a given value in a set of measurements, while precision is the proximity Accuracy = TP+TN TP+TN+FP+FN (26)   F1 score The weighted (sensitivity) and the accurate average of recall.F1, if you are trying to balance precision and reminder, maybe the right choice where μ is set to 4, the bifurcation coefficient is also defined.x n means the n th chaotic variable, in other words, x n ∈ (0, 1) in favor of limitations that the initial x 0 ∈ (0, 1) of severely static periods (0, 0.25, 0.5, 0.75,1).Figure 8 of the logistic map shows a consistently divided sequence, which prevents it from effectively immersing into minor regular cycles.
Because the problem has more than one objective, it is understood to be a multi-objective problem 57 .Following steps must be taken to solve the multi-objective issue of selecting optimal features.The first is to produce the highest accuracy rate, and the second is to eliminate the features to the lowest range.Taking this into consideration, the fitness function of the resulting solution evaluation is configured to balance the aims as follows: Given that |S| for the length of the selected subset feature cardinality, α and β are generated as parameters for expressing a weight for the percentage of classification accuracy and the total number of selected features respectively, α ϵ [0,1] and β = 1 − α and have been selected concerning the evaluation function, γ R (D) denotes the classification error rate.|D| represents dataset cardinality.So, to find the K neighbors for the KNN classifier, the Euclidean distance 58 must be calculated as follows: Qi and Pi relate to a specific feature in the sample, "i" refer to the number of features in the sample, and d refers to the overall number of features used in the analysis.To reduce overfitting, cross-validation is a popular strategy.Cross-validation with K = 10 is utilized in this paper.
In contrast to binary values, continuous values represent the positions of the search agents formed by the algorithm.A straight application to our situation would be impossible because it is incompatible with the standard binary format for feature selections.Features are selected depending on the problem of feature selection to increase the performance and accuracy of the classification system (0 or 1 in the case of binary features).A transformation function is used to convert a binary search space.The following equations can convert any continuous value into binary with the sigmoid function 57 : where i = 1, …, d, and x binary parameter identified as 0 or 1 by randomly selected value in range: R ϵ [0,1] value compared to x s i , the value of the parameter x s i which defined in the S-shaped search agent is the value identified by the algorithm calculations (continuous), All trials were conducted on a Windows 10 Pro 64-bit operating system with a Core(TM) i7-8550U CPU running at 1.80 GHz and 1.99 GHz, respectively.To implement the algorithms, we use MATLAB (2018a).The selected values of the algorithms to be its parameters were collected from the literature to make sure that the algorithms are compared on an equal basis 59 .Although the KNN classification unit for feature selection is a frequent wrapper, it can also be thought of as a learning algorithm that is monitored and characterized by simple and quick learning.There are twenty different runs for each algorithm with a random seed.The maximum number of iterations for all subsequent experiments using the standard k-fold cross-validation is 20.
Multiple observational experiments were conducted on a variety of datasets to determine the best literature values for α and β.Therefore, it has the value of 0.9 for α and has the value of 0.1 for β.The parameters settings of our experiments are shown in Table 5.
Tables 6 and 7 show the resulted feature and the accuracy respectively.The experimental results are conducted for the standard grey wolf optimization (GWO), the Ant Colony Optimization (ACO), the Butterfly Optimization Algorithm (BOA), the Particle Swarm Optimization (PSO), and the Modified Grey Wolf Optimization (MGWO) algorithms.The experimental results show the superiority of the proposed MGWO in both achieving the least set of features in all the datasets while producing a fair accuracy in most of the utilized datasets.These results are graphically displayed in Figs. 9 and 10.
According to the conclusion of these results, we can say that the MGWO can be used for our plant disease problem.

Experiment 2
According to the experimental result in the first experiment, the modified grey wolf optimization algorithm (MGWO) can be effective as a wrapper feature selection algorithm.In experiment 2, the core problem of the plant disease classification and prediction is introduced.As discussed in the previous section, the first stage of the proposed model is the feature extraction process where the pre-trained AlexNet CNN is used.This process is performed for ten datasets.The second stage is the feature selection process, in which the MGWO is used as the wrapper feature section method.Lastly, the generated reduced features set were used SVM training.The datasets' details are discussed in the next subsection.

Datasets description
Plants play a crucial role in climate regulation and erosion reduction.To preserve the environment, ecosystem, and living beings, they are both equally necessary to consider.Deciduous and coniferous trees are the most common types.Compared to conifers, deciduous trees have broader and bigger leaves.During the fall, their leaves fall off.This is due to the giant leaf, which allows for more photosynthesis to occur.Trees of this type are famous for their high wood production.There is a coniferous tree or evergreen tree green throughout the year.Leaves have a triangular form and grow upwards in most cases.Even though they have softer wood, they are pretty durable and resistant to various weather conditions 60 .The data on https:// data.mende ley.com/ datas ets/ hb74y nkjcn/1 focuses on plants that contribute both ecologically and economically.As a result, ten different plants, such as Jamun, Lemon, Sukh Chain, Arjun, Pomegranate, Jatropha, Mango, Saptaparni, Guava, and Chinar, have been selected, as shown in Table 8.Images have been divided into two categories: healthy and diseased images.Table 9 shows the dataset description.

Results
The proposed model (AlexNet as feature extraction, MGWO as a feature selection, and the SVM as a classifier) has achieved better results compared to Alexnet, GoogleNet, and the SVM.The results showed in Table 10 give a comparison among the AlexNet, GoogleNet, SVM, and the proposed model through different metrics such as Table 8.Sample of healthy and diseased leaf images of the plant's disease dataset.

Conclusion and future work
We present in this paper a paradigm for the identification of plant diseases.Initially, a comparison is undertaken using the SVM, AlexNet, and Google Net-based transfer training method, which will be used on the edge servers with increased computational capability, to detect plant diseases.Then, with the AlexNet feature extraction and support vector machines for plant detection and classification diseases, we proposed a hybrid approach based on the modified gray wolf optimization algorithm for eliminating the resulted features from the AlexNet.
The proposed model can operate on Internet of Things (IoT) devices that use a framework that integrates fog and cloud computing with limited resources.Experimental evidence shows that the suggested models can detect plant diseases accurately using the minimum computational resources from real-world datasets.The proposed model worked better on most data sets.In the future, using blockchain technology, we hope to improve the fog environment without impacting the efficiency of features map extraction.
We will also develop apps to detect plant diseases to support smart agriculture with deep learning support.

Figure 3 .
Figure 3. GWO's social structure and position update method.

Figure 4 .
Figure 4. Smart agriculture IoT with edge, fog, and cloud computing.

Figure 5 .
Figure 5. Block diagram of the proposed IoT smart agriculture network architecture.

Figure 6 .
Figure 6.Dataflow diagram of the proposed methodology.

2 Figure 7 .
Figure 7. Flowchart of the grey wolf optimization algorithm.

Figure 8 .
Figure 8. Flowchart of logistic map for initialization.

Figure 9 .
Figure 9.The features reduction for different algorithms.

Figure 10 .
Figure 10.The classification accuracy for different algorithm.
Figure 11 display the comparison among the different model concerning the accuracy metric.A comparison between the SVM which trained for the extracted features directly without feature selection and the SVM which trained to the selected features by the MGWO that extracted by AlexNet showed in Fig.12.The ROC curve on the test set for the proposed model SVM is introduced in Fig.13.

Figure 11 .
Figure 11.Classification accuracy for the four models.

Figure 12 .
Figure 12.Classification accuracy of the standard SVM Vs the proposed model.

Table 1 .
ML/DL for plant disease detection.

Table 2 .
The confusion matrix.

Table 4 .
Datasets used for evaluating the MGWO.

Table 6 .
The features reduction for different algorithms.Significant values are in bold.

Table 7 .
The classification accuracy for different algorithms.Significant values are in bold.

Table 9 .
Plants diseases datasets description.

Code Name Full name Name of the disease No. of healthy images No. of diseased images
www.nature.com/scientificreports/sensitivity, specificity, precision, F1-score, and accuracy.The proposed model achieved the highest accuracy in all datasets except the dataset named p2 in which the GoogleNet achieved the best accuracy.

Table 10 .
Classification results for the four models.